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Abstract 


This  project  addresses  the  statistical  inverse  problem  of  reconstruction  of  an  uncertain 
shape  of  a  scatterer  or  properties  of  a  medium  from  noisy  observations  of  scattered  wave- 
fields.  The  Bayesian  solution  of  this  inverse  problem  yields  a  posterior  pdf,  requiring  the 
solution  of  the  forward  wave  equation  to  evaluate  the  density  for  any  point  in  parameter 
space.  The  standard  approach  is  to  sample  this  pdf  via  an  MCMC  method  and  then  com¬ 
pute  statistics  of  the  samples.  However,  standard  MCMC  methods  view  the  underlying 
parameter-to-observable  map  as  a  black  box,  and  thus  do  not  exploit  its  structure,  hence 
becoming  prohibitive  for  high  dimensional  parameter  spaces  and  expensive  simulations. 

We  have  developed  a  Langevin-accelerated  MCMC  method  for  sampling  high-dimensional 
PDE-based  probability  densities.  The  method  builds  on  previous  work  in  Langevin  dynam¬ 
ics,  which  uses  gradient  information  to  guide  the  sampling  in  useful  directions,  improving 
convergence  rates.  We  have  extended  the  Langevin  idea  to  exploit  local  Hessian  informa¬ 
tion,  leading  to  a  stochastic  version  of  Newton’s  method.  We  have  also  begun  to  analyze  the 
spectral  structure  of  the  Hessian  for  inverse  scattering  problems.  Applications  to  model  in¬ 
verse  medium  scattering  problems  indicate  several  orders  of  magnitude  improvement  over 
a  reference  black-box  MCMC  method. 

Background 

The  overall  goal  of  this  project  is  to  create  systematic,  rigorous,  and  scalable  algorithms  for 
quantifying  uncertainties  in  inverse  wave  scattering  problems.  These  uncertainties  reflect 
our  incomplete  knowledge  of  the  medium  in  which  the  waves  propagate  (inverse  medium 
scattering  problem)  or  the  shape  of  a  scatterer  (inverse  shape  scattering  problem).  The  prob¬ 
lem  of  inferring  an  uncertain  medium  or  shape  from  observations  of  scattered  wavefields 
is  fundamentally  a  statistical  inverse  problem.  Our  lack  of  knowledge  results  from  noisy 
measurements,  sparse  observers,  uncertain  forward  models,  and  uncertain  prior  model  pa¬ 
rameter  information.  Uncertainty  in  the  reconstructed  model  parameters  is  a  fundamental 
feature  of  ill-posed  inverse  problems. 

The  deterministic  approach  to  the  inverse  scattering  problem,  which  amounts  to  minimiz¬ 
ing  a  regularized  data  misfit  function,  is  incapable  of  accounting  for  uncertainties  in  the 


solution  of  the  inverse  problem.  Bayesian  inference  provides  a  systematic  framework  for 
incorporating  uncertainties  in  observations,  forward  models,  and  prior  knowledge  to  quan¬ 
tify  uncertainties  in  the  model  parameters.  Suppose  the  relationship  between  output  observ¬ 
ables  y  (such  as  waveforms  at  sensor  locations)  and  uncertain  model  parameters  p  (such  as 
those  describing  a  wave  speed  of  a  heterogeneous  medium  or  shape  of  a  scatterer)  is  de¬ 
noted  by  j/  =  /(p,  e),  where  e  represents  noise  due  to  measurement  and/or  modeling  errors. 
In  other  words,  given  the  model  parameters  p  and  noise  e,  the  function  /(p,  e)  solves  the 
forward  (acoustic,  elastic,  or  electromagnetic  wave  propagation)  problem  to  yield  y,  the 
predicted  outputs  at  the  measurement  locations  (and  time  instants).  Suppose  also  that  we 
have  the  prior  probability  density  7rprior(p)>  which  encodes  the  confidence  we  have  in  prior 
information  on  the  unknown  model  parameters  (i.e.  independent  of  present  observations), 
and  the  likelihood  function  7r(yobs|p)>  which  describes  the  conditional  probability  that  the 
model  parameters  p  gave  rise  to  the  actual  measurements  Pobs-  Then  Bayes’  theorem  of  in¬ 
verse  problems  expresses  the  posterior  probability  density  of  the  model  parameters,  TTpost, 
given  the  data  xjohs,  as  the  conditional  probability 

7rpost(p)  7r(p|yobs)  OC  7rprior(p)  7r(yobs|p)-  (1) 

Expression  (1 )  provides  the  statistical  solution  of  the  inverse  problem  as  a  probability  den¬ 
sity  for  the  model  parameters  p. 

While  it  is  easy  to  write  down  the  expression  (1)  for  the  posterior  probability  density, 
making  use  of  this  expression  poses  a  challenge,  because  the  posterior  probability  density 
is  a  surface  in  high  dimensions  (equal  to  the  number  of  model  parameters  p),  and  because 
the  solution  of  the  forward  problem  (i.e.,  computing  /(p)  given  p)  is  required  to  evaluate 
the  probability  of  any  point  in  parameter  space.  Straightforward  grid-based  sampling  is  out 
of  the  question  for  anything  other  than  a  few  parameters  and  cheap  forward  simulations. 
Special  sampling  techniques,  such  as  Markov  chain  Monte  Carlo  (MCMC)  methods,  have 
been  developed  to  generate  sample  ensembles  that  typically  require  many  fewer  points  than 
grid-based  sampling.  Still,  when  the  model  parameters  represent  a  (suitably-discretized) 
field  (such  as  scatterer  shape  or  medium  wave  speed),  and  when  the  forward  PDE  requires 
hours  to  solve  on  a  parallel  computer  (such  as  mid-to-high  frequency  wave  propagation), 
the  MCMC  framework  becomes  completely  intractable. 

The  central  difficulty  in  scaling  up  conventional  MCMC  for  large-scale  forward  simula¬ 
tions  and  high-dimensional  parameter  spaces  is  that  this  is  a  purely  black-box  approach, 
i.e.  it  does  not  exploit  the  structure  of  the  parameter-to-observable  map  /(p).  Twenty  years 
of  advances  in  algorithms  for  deterministic  large-scale  PDE-constrained  optimization  have 
taught  us  that  making  maximal  use  of  derivative  information  can  greatly  speed  up  the  search 
process  for  extremum  points.  The  goal  of  this  project  is  to  overcome  the  intractability  of 
conventional  methods  for  statistical  inverse  scattering  problems  by  developing  scalable  al¬ 
gorithms  that  exploit  the  structure  of  inverse  wave  propagation  operators.  Our  work  this 
year  has  focused  on  developing  preconditioned  Langevin  methods  that  exploit  this  struc¬ 
ture  to  greatly  speed  up  sampling.  This  has  involved  the  analysis  and  construction  of  fast 
algorithms  for  approximating  the  inverse  Hessian  for  inverse  wave  scattering  problems. 


2 


Status/Progress:  Fast  Hessian-preconditioned  Langevin  samplers  (UT  Austin) 

This  past  year  we  have  been  developing  fast  sampling  methods  that  build  on — and  signifi¬ 
cantly  extend — ideas  from  Langevin  dynamics,  which  use  gradient  information  to  acceler¬ 
ate  sampling  of  a  target  density.  The  Langevin  equation  is  a  stochastic  differential  equation, 

dPt  =  AV  logTTpostC^f  +  (2) 

with  TTpostCp)  tis  an  invariant  density.  Here,  W t  is  the  i.i.d.  vector  of  standard  Brownian 
motions.  Preconditioning  by  a  symmetric  positive  definite  operator  A  preserves  the  invari¬ 
ance  of  the  density.  In  practice,  we  discretize  in  time  with  timestep  Af,  yielding  (e.g.  for 
explicit  Euler)  the  update 

Pfc+i  =  Pk  +  AV  log  TTpost Af  -I-  \/2AfA'/^A/'(0, 1)  (3) 

where  AA(0, 1)  is  the  i.i.d.  standard  normal  density.  Discretization  in  time  can  add  bias, 
so  we  use  the  Langevin  steps  as  proposals  for  the  Metropolis-Hastings  algorithm.  The 
form  (3)  shows  immediately  the  connection  with  deterministic  optimization  methods:  the 
gradient  term  VlogTTpost  is  a  steepest  ascent  direction  for  the  posterior  density.  In  its 
absence  (and  in  the  absence  of  preconditioning,  i.e.  A  =  /)  we  recover  a  Gaussian  random 
walk  from  the  last  term  in  (2).  The  addition  of  the  gradient  term  drives  the  sampling  in  (the 
locally  steepest)  direction  of  higher  probability.  However,  steepest  descent  is  a  poor  choice 
for  search  directions  in  large-scale  optimization  (particularly  for  anisotropic  pdfs),  and  we 
seek  to  improve  on  it. 

Taking  the  preconditioner  A  as  the  inverse  of  the  Hessian  of  logTTpost.  we  obtain  the 
stochastic  equivalent  of  Newton’s  method.  In  the  common  case  of  Gaussian  additive  noise 
and  prior,  the  (negative)  log  of  the  posterior  density  is  simply  the  “regularized”  misfit  func¬ 
tion  (the  sum  of  the  data  misfit  and  prior/regularization  term)  that  deterministic  inverse 
methods  seek  to  minimize.  Thus,  similar  to  Newton’s  locally-quadratic  approximation  of 
the  objective,  the  Hessian-preconditioned  Langevin  step  makes  a  locally-Gaussian  approxi¬ 
mation  of  TTpost-  This  endows  the  sampling  process  with  information  on  the  curvature  of  the 
posterior  density  surface,  which  is  crucial  in  high  dimensions.  We  expect  this  to  result  in  a 
need  for  substantially  fewer  sampling  points  relative  to  a  black-box  MCMC  method,  just  as 
deterministic  Newton  requires  substantially  fewer  iterations  to  find  the  optimum  compared 
to  a  derivative-free  optimization  method. 

Moreover,  it  can  be  shown  that  in  the  limiting  case  when  the  posterior  density  TTpost 's  in 
fact  Gaussian  (e.g.  when  the  inverse  problem  is  linear  and  the  noise  is  additive  and  Gaus¬ 
sian),  this  .so-called  stochastic  Newton  method  not  only  samples  the  target  density  at  long 
times,  but  accurately  samples  from  TTpost  at  every  time  step.  This  means  that  the  Metropolis- 
Hastings  criterion  will  accept  all  of  the  proposed  sample  points,  and  that  a  minimum  num¬ 
ber  of  points  are  needed  to  accurately  sample  from  the  given  distribution.  For  densities 
that  are  not  Gaussian,  stochastic  Newton  will  still  provide  a  substantial  speedup  over  a 
conventional  random  walk,  since  a  Gaussian  approximation  (based  on  a  local  quadratic 
approximation  of  logTTpost.  or  equivalently  a  linearized  approximation  of  the  inverse  prob¬ 
lem)  will  generally  prove  to  yield  more  useful  information  on  the  behavior  of  TTpost  than  a 
standard  normal  density  approximation  (or  other  heuristic)  will. 


3 


^  T 


We  have  developed  a  preliminary  implementation  of  the  stochastic  Newton  method,  and 
applied  it  to  solve  nonlinear  inverse  medium  and  shape  scattering  problems  in  one  and 
two  dimensions.  For  example,  for  a  65-parameter  ID  inverse  medium  problem.  Figure 
1  indicates  just  O(IO^)  samples  are  necessary  to  adequately  sample  the  (non-Gaussian) 
posterior  density,  while  a  reference  (non-derivative)  MCMC  method  (Delayed  Rejection 
Adaptive  Metropolis  (DRAM))  is  nowhere  near  converged  after  even  0(10^)  samples.  The 
performance  of  unpreconditioned  Langevin  MCMC  is  similar  to  that  of  DRAM,  indicat¬ 
ing  the  crucial  role  of  the  Newton  direction  vs.  steepest  descent.  Moreover,  because  the 
(inverse)  Hessian  captures  the  (local)  covariance  structure  of  the  posterior  density,  this 
orders-of-magnitude  speedup  is  expected  to  become  even  larger  as  the  parameter  dimen¬ 
sion  increases. 


MPSRF  Plots  fof  650  Chains 


Figure  1:  Comparison  of  number  of  points  taken  for  sampling  posterior  density  for  a  65-dimensional  inverse 
scattering  problem  to  identify  the  distribution  of  elastic  moduli  for  a  layered  medium,  from  reflected  waves. 
DRAM  (black),  unpreconditioned  Langevin  (blue),  and  Stochastic  Newton  (red)  sampling  methods  are  com¬ 
pared.  Convergence  indicator  is  multivariate  potential  scale  reduction  factor  (MPSRF)  for  which  a  value  of 
unity  indicates  convergence.  Stochastic  Newton  requires  three  orders  of  magnitude  fewer  sampling  points. 


Status/Progress:  Fast  approximations  of  Hessian  operators  (Georgia  Tech) 

This  past  year  we  have  also  been  working  on  fast  approximations  for  Hessian  operators, 
which  are  critical  component  of  fast  sampling  algorithms  for  statistical  inverse  problems. 
Hessian  operators  are  mainly  determined  by  three  characteristics  of  the  inverse  problem: 
the  parameters,  the  underlying  PDF,  and  the  observation  operator.  Previously  we  have 
analyzed  Hessians  for  definite  elliptic  and  parabolic  equations,  for  different  observation 
operators  for  the  inverse  source  and  inverse  medium  problems  (see  publications).  Recently 
we  have  started  working  with  indefinite  elliptic  operators,  namely  the  Helmholtz  operator 
for  inverse  medium  problems.  We  consider  the  problem  of  the  inversion  of  the  scalar  wave 
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equation  in  a  lossy  medium.  In  particular,  we  focus  on  broadband  multi-point  illumination 
problems  for  low  frequency  regimes. 


More  specifically,  we  consider 


-V^u(r,f)  -I-  <T(r) 


du{r,  t) 
dt 


+ 


1  d‘^u{r,  t) 

c2(r)  dt^ 


S{r,t) 


(4) 


where  u  denotes  the  state,  c  the  speed  of  .sound,  a  viscous  dissipation,  and  5  the  source 
term.  If  all  variables  are  time-harmonic  with  a  fixed  angular  frequency  u,  in  the  frequency 
domain  Equation  (4)  becomes 


-V2u(r)  + 


iu!a{r) 


u{t) 


S{v,u). 


(5) 


We  wish  to  design  a  fast  method  for  the  recovery  of  either  c(r)  or  <T(r)  from  the  values  of  the 
scattering  data  measured  on  Na  detectors.  To  generate  the  set  of  scattered  measurements, 
a  small  number  of  point  sources  (Ng)  will  be  considered.  The  positions  of  the  sources  will 
be  denoted  (1  <  f  <  Ng).  For  each  source,  we  consider  Nf  frequencies  co. 

We  use  an  integral  equation  formulation  for  the  forward  problem  and  a  Born  approximation 
for  the  inverse  problem,  leading  to  an  ill-posed  linear  least-squares  problem.  If  Nj  is 
the  number  of  excitation  frequencies,  Ng  the  number  of  different  locations  for  the  point 
illuminations,  Nd  the  number  of  detectors,  and  N  the  parametrization  for  the  scatterer,  a 
direct  SVD  will  have  0{{NaNdNf)‘^N)  computational  cost.  We  have  developed  a  fast  SVD 
method  that,  by  introducing  a  controllable  error  in  the  factorization,  brings  this  cost  down 
to  0{NfN)  thus,  providing  orders  of  magnitude  speed-up  over  the  generic  factorization 
algorithm.  The  fast  SVD  approach  consists  of  the  following  components:  a  plane  wave 
expansion  of  the  forward  operator;  a  fast  randomized  SVD  for  the  single-source-many- 
frequencies  problem;  and  a  multi-step  combination  of  the  single  source  SVDs  to  construct 
the  overall  factorization.  We  are  currently  verifying  and  validating  our  method  and  are 
examining  ways  to  extend  it  to  higher  frequency  regimes. 
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